Exploiting the Web as the multilingual corpus for unknown query translation

نویسندگان

  • Jenq-Haur Wang
  • Jei-Wen Teng
  • Wen-Hsiang Lu
  • Lee-Feng Chien
چکیده

Users’ cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this paper, we investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. We propose a Web-based term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine. This approach can enhance the construction of a domain-specific bilingual lexicon and bring multilingual support to a digital library that only has monolingual document collections. Very promising results have been obtained in generating effective translation equivalents for many unknown terms, including proper nouns, technical terms and Web query terms and in assisting bilingual lexicon construction for a real digital library system. Wang, J. H., Teng, J. W, Lu, W. H. and Chien, L. F. 3 Exploiting the Web as the Multilingual Corpus for Unknown Query Translation

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Transitive Model for Extracting Translation Equivalents of Web Queries through Anchor Text Mining

One of the existing difficulties of cross-language information retrieval (CLIR) and Web search is the lack of appropriate translations of new terminology and proper names. Different from conventional approaches, in our previous research we developed an approach for exploiting Web anchor texts as live bilingual corpora and reducing the existing difficulties of query term translation. Although We...

متن کامل

Towards Web Mining of Query Translations for Cross-Language Information Retrieval in Digital Libraries

This paper proposes an efficient client-server-based query translation approach to allowing more feasible implementation of cross-language information retrieval (CLIR) services in digital library (DL) systems. A centralized query translation server is constructed to process the translation requests of cross-lingual queries from connected DL systems. To extract translations not covered by standa...

متن کامل

Exploiting the Web as Parallel Corpora for Cross- Language Information Retrieval

The expansion of the Web creates more requirements for Cross-Language Information Retrieval (CLIR). Query translation is the key problem. Previous studies have shown that query translation can be done by exploiting a large set of parallel texts. However, the problem arisen is the unavailability of large parallel corpora for many languages. In this paper, we describe a mining system that automat...

متن کامل

Exploiting a Multilingual Web-based Encyclopedia for Bilingual Terminology Extraction

Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopedias such as Wikipedia as comparable corpora for bilingual terminology e...

متن کامل

Using Query-Relevant Documents Pairs for Cross-Lingual Information Retrieval

The world wide web is a natural setting for cross-lingual information retrieval. The European Union is a typical example of a multilingual scenario, where multiple users have to deal with information published in at least 20 languages. Given queries in some source language and a target corpus in another language, the typical approximation consists in translating either the query or the target d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JASIST

دوره 57  شماره 

صفحات  -

تاریخ انتشار 2006